2  Overview

This notebook outlines the complete workflow for processing NDVI (Normalized Difference Vegetation Index) data from the deep-extremes-minicubes dataset provided by the Remote Sensing Centre for Earth System Research. The steps include data preprocessing, dataset initialization, data processing, and sanity checks. Additionally, we perform a train/test split based on the analysis of missing values and implement strategies to handle these missing values, preparing the data for training.

2.1 DeepExtremes Minicubes

The DeepExtremes project provides a dataset of minicubes, which are small, manageable data cubes that contain various types of remote sensing data. Each minicube is a 3D array with dimensions (495 x 128 x 128) representing time, latitude, and longitude, and they are designed to facilitate the study of extreme events and their impacts on the Earth system.

2.1.1 Key Features of the Minicubes:

  • Spatial Resolution: Each minicube covers an area with a spatial resolution of 20 meters.
  • Temporal Coverage: Data spans from January 1, 2016, to October 10, 2022, with a time resolution of 5 days.
  • Spectral Bands: The minicubes include reflectance data from several Sentinel-2 spectral bands (B02, B03, B04, B05, B06, B07, B8A), as well as additional layers such as the Scene Classification Layer (SCL) and cloud masks.
  • Additional Variables: ERA5 reanalysis data (e.g., evaporation, surface pressure, temperature), DEM (Digital Elevation Model) data, and event labels for extreme occurrences.

Each minicube is rich with metadata, including details on the geographic location, creation date, data processing steps, and variable-specific attributes. This metadata ensures the data’s integrity, traceability, and usability for scientific analysis.

2.2 Description and Overview

2.2.1 1. Data Import

  • Import Data: The dataset is imported from an S3 bucket using AWS credentials.
  • Remove Invalid Cubes: Cubes with invalid data are filtered out.
  • Data Distribution: Cubes are split into training, validation, and test sets based on a predefined split table.

2.2.2 2. Dataset Initialization

  • Initialize Dataset: The dataset is initialized with minicubes, each having dimensions of 495 (time periods) x 128 x 128 (pixels).
  • NDVI Calculation: The Python class that initializes the dataset calculates the NDVI for each pixel, derived from the spectral bands B8A (near-infrared) and B04 (red).

2.2.3 3. Data Processing

  • Chunking Data: Data is processed in chunks to prevent memory overflow.
  • NDVI Preprocessing: NDVI values are preprocessed, and pixels with an average NDVI below 0.2 are masked.
  • Data Saving: Processed data is saved in float32 format to save memory.

2.2.4 4. Sanity Checks

  • Cube Integrity: Verify the shape and dimensions of each cube.
  • NDVI Distribution: Analyze the distribution of NDVI values.
  • Low NDVI Values: Ensure no pixels have an average NDVI below 0.2.
  • Cube Class Comparison: Compare cube classes with NDVI values and masked values.

2.2.5 5. Train/Test Split

  • Identify Suitable Starting Date: Determine the starting date for training data and set the time period for train and test data.
  • Perform Train and Test Split: Split the data into training and test sets and save it to the appropriate directories.

2.2.6 6. Handling of Missing Data

  • Approach A: Handling NaNs with Outliers and Cloud Mask Integration
  • Approach B: Interpolation with STL Decomposition

2.3 1. Data Import

First we import the necessary packages and the deep-extremes-minicubes dataset from the S3 Bucket, remove invalid cubes and split the dataset.

# Define base_dir for consistent path management
from pathlib import Path
import os

notebook_dir = Path(os.getcwd()).resolve()

base_dir = notebook_dir.parent 

print(base_dir)
/home/cgoehler/team-extra/ndvi-time-series-prediction
# Import necessary packages and custom functions
import sys
sys.path.insert(0, os.path.join(base_dir, "src", "data_processing"))
import s3fs
import itertools
import zarr
import math
import numpy as np
import matplotlib.pyplot as plt
import xarray as xr
import numpy as np
from import_cubes import *
from helper import *
import torch
from my_loader import DeepCubeTSDatasetBasti
from process_ndvi import *
from sanity_checks import *
from stl_interpolation import *
from statsmodels.tsa.seasonal import STL
# AWS Credentials
AWS_ACCESS_KEY_ID = "***"
AWS_SECRET_ACCESS_KEY = "***"
AWS_DEFAULT_REGION = "eu-central-1"
# Initialize S3FileSystem
minicubefs = s3fs.S3FileSystem(key=AWS_ACCESS_KEY_ID, secret=AWS_SECRET_ACCESS_KEY)
# Read registry from S3FileSystem
bucket_path = "s3://deepextremes-minicubes/1.2.2"
registry_df = read_registry(bucket_path, minicubefs)
registry_df.shape
(5593, 1)
# Remove bad cubes
print("Initial number of Cubes: ", registry_df.shape[0])
registry_df_filtered = remove_cubes(registry_df)
print("Number of Cubes after removal: ", registry_df_filtered.shape[0])
Initial number of Cubes:  5593
Number of Cubes after removal:  5397

To ensure a balanced data distribution over the entire globe while reducing the total number of cubes, we use the split provided in the split_table.csv file.
Subsequently, we will only utilize the cubes within the trainingset. Since this is a time-series prediction, our train/validation/test split will be applied separately to each cube by splitting the time series of each cube individually.

# Split cubes in training, validation and testset
split_table_path = base_dir / "csvs" / "split_table.csv"
traincubes, valcubes, testcubes = split_datasets(
    cube_registry=registry_df_filtered, split_table_path=split_table_path
)
/home/bastiloeblein/team-extra/ndvi-time-series-prediction/Notebooks/../src/data_processing/import_cubes.py:88: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  cube_registry["mc_id"] = cube_registry["mc_id"].apply(preprocess_mc_id)
print(f"Number of entries in the training dataset: {len(traincubes)}")
print(f"Number of entries in the validation dataset: {len(valcubes)}")
print(f"Number of entries in the test dataset: {len(testcubes)}")
Number of entries in the training dataset: 3052
Number of entries in the validation dataset: 666
Number of entries in the test dataset: 97
# Define the S3 bucket name
s3_bucket = "deepextremes-minicubes/1.2.2"
# Get quantile data
quantile_data = get_var_quantiles()
# dict(itertools.islice(quantile_data.items(), 1))

2.4 2. Dataset Initialization

We initialize the dataset containing minicubes (each with dimension 495 (time periods) x 128 x 128 (pixels)) with additional information on pixel-wise NDVI and cube classes (e.g. soil, meadow).

Due to data quality and memory problems we decided to only use 100 cubes. Therefore we identified the 100 cubes, which have the most time periods containing >80% pixels with not nan values.

valid_cubes_path = base_dir / "csvs" / "Final_Cubes.csv"
valid_df =pd.read_csv(valid_cubes_path, sep = ";")
valid_indices = valid_df["Cube_ID"].to_list()
# Initialize dataset
dataset = DeepCubeTSDatasetBasti(
    minicubefs, bucket_path, traincubes, quantile_data, valid_indices
)
dataset.__repr__
<bound method DeepCubeTSDatasetBasti.__repr__ of <(128 * 128 data points per cube): 16.384 * 100 (number of cubes) = 1638400 (Total number of data points)>>

2.5 3. Data Processing

The data is processed in chunks to manage memory usage.

NDVI values are preprocessed to mask pixels where NDVI prediction is not meaningful (e.g. pixels that do not contain any vegetation). Therefore, we will mask pixels that have an average NDVI value below 0.2 (https://earthobservatory.nasa.gov/features/MeasuringVegetation) over the entire time period.

# Initialize an empty dictionary to store the NDVI data
ndvi_data = {}

# Process the cubes for the valid indices
for i in valid_indices:
    print(i)
    ndvi_tensor, cloudmask_tensor, lat, lon, cube_class = dataset.__getitem__(i)
    
    # Calculate the average NDVI for each pixel
    average_ndvi = calculate_average_ndvi_for_each_pixel(ndvi_tensor)

    # Mask low NDVI values
    ndvi_masked = mask_low_ndvi_values(average_ndvi, ndvi_tensor)

    # Convert the PyTorch tensors to NumPy arrays
    ndvi_data = ndvi_masked.numpy()
    cloud_mask_data = cloudmask_tensor.numpy()
    
    # Replace all -9999.0 values with NaN in the NDVI data
    ndvi_data = np.where(ndvi_data == -9999.0, np.nan, ndvi_data)

    # Generate the dates
    dates = pd.date_range(start='2016-01-01', periods=495, freq='5D')
    
    # Create xarray DataArrays
    ndvi = xr.DataArray(ndvi_data, coords=[dates, range(128), range(128)], dims=['time', 'x', 'y'], name='NDVI')
    cloud_mask = xr.DataArray(cloud_mask_data, coords=[dates, range(128), range(128)], dims=['time', 'x', 'y'], name='Cloud_Mask')
    
    # Combine into a Dataset
    data = xr.Dataset({
        'NDVI': ndvi,
        'Cloud_Mask': cloud_mask
    })


    # Add lat and lon as attributes
    data.attrs['lat'] = lat
    data.attrs['lon'] = lon
    data.attrs['class'] = cube_class

    # Add metadata to the variables and dataset
    data['NDVI'].attrs['description'] = 'Normalized Difference Vegetation Index'
    data['Cloud_Mask'].attrs['description'] = 'Cloud Mask (0=clear, 1=cloudy)'
    data.attrs['source'] = f'Cube_{i}'

    # Save to NetCDF
    save_dir = base_dir / "data" / "data_final_updated"
    data.to_netcdf(save_dir / f'Cube_{i}.nc')

data

2.5.1 4. Sanity Checks

At last we conduct some sanity checks:

  1. Cube Integrity: Ensure that the shape and dimensions of each cube are consistent and correct.
    Verify that the number of time steps, rows, and columns match the expected dimensions.

  2. Distribution of NDVI Values: Analyze the distribution of NDVI values to ensure they fall within the expected range [0, 1].
    Check for any unexpected outliers or anomalies in the data.

  3. No Pixels with Average NDVI Below 0.2: Ensure that there are no pixels with an average NDVI value below 0.2 over the entire time period.

  4. Comparison with Cube Class: Compare the cube class with the overall average NDVI (considering all pixels and time periods).
    Additionally, compare the number of masked values with the cube class to check for consistency.

# Lists to store errors
error_list = []
bad_cubes = []

# Path to the directory with NDVI chunks
data_dir = base_dir / "data" / "data_final_updated"
data_list = os.listdir(data_dir)
print(len(data_list))

# Loop pver all NetCDF files
for nc_file in data_list:
    if nc_file.endswith('.nc'):
        nc_path = os.path.join(data_dir, nc_file)
        print(f"Processing {nc_path}...")

        # Load the currend NetCDF file
        data = xr.open_dataset(nc_path)
        ndvi_data = data['NDVI'].values
        
        cube_class = data.attrs.get('class')
        
        # Expected shape of the NDVI cubes
        expected_shape = (495, 128, 128)

        try:
            # Step 1: Check cube integrity
            check_cube_integrity(ndvi_data, expected_shape)

            # Step 2: Check the distribution of NDVI values
            check_ndvi_distribution(ndvi_data)

            # Step 3: Ensure no pixels have an average NDVI below 0.2 and calculate additional metrics
            overall_avg_ndvi, num_masked_values = check_if_contains_low_values(ndvi_data)

            print(f"Cube {nc_file} - Class: {cube_class}, Overall Average NDVI: {overall_avg_ndvi}, Number of Masked Values: {num_masked_values}")
            print(f"Cube {nc_file} passed all sanity checks.\n")

        except AssertionError as e:
            error_message = f"Cube {nc_file} failed sanity check: {str(e)}"
            print(error_message)
            error_list.append(error_message)
        except Exception as e:
            error_message = f"Cube {nc_file} encountered an error: {str(e)}"
            print(error_message)
            bad_cubes.append(nc_path)

# Display all errors
if error_list:
    print("\nSummary of errors:")
    for error in error_list:
        print(error)
else:
    print("\nAll cubes passed sanity checks.")

# Display list of all bad cubes
if bad_cubes:
    print("\nList of bad cubes:")
    for bad_cube in bad_cubes:
        print(bad_cube)

2.5.2 5. Train/Test Split

In this section, we will perform the train and test split for our time series data. Here’s a structured plan for what we will accomplish:

  1. Identify Suitable Starting Date:
    • We have observed that the initial portion of our time series (particularly in the year 2016) contains a significant number of NaN values.
    • To address this, we will conduct a brief analysis to determine an appropriate starting date for our time series prediction. This step is crucial to ensure we have sufficient data for effective interpolation especially at the beginning of our time series.
  2. Perform Train/Test Split:
    • Once a reasonable starting date is identified, we will proceed to split the data into train and test sets.
    • The train set will be used to train our model, while the test set will be used to evaluate its performance.

By carefully selecting a starting point and splitting the data, we aim to enhance the accuracy and reliability of our time series predictions and interpolation of missing values.

2.5.2.1 5.1 Identify Suitable Starting Date

file_path = base_dir / "data" / "data_final_updated"
nc_files = [file for file in os.listdir(file_path) if file.endswith('.nc')]
print(f"Number of files: {len(nc_files)}")
Number of files: 100

Upon analyzing our dataset, we observed that for the year 2016, there are only sporadic pixels containing NDVI values over all cubes, with the majority being NaNs. To ensure the quality and completeness of our data for time series prediction, we will exclude the year 2016 from our dataset.

# Initialize a dictionary to store the results
results = {}

# Iterate over each .nc file and perform the calculations
for nc_file in nc_files:
    full_path = os.path.join(file_path, nc_file)
    ds = xr.open_dataset(full_path)
    
    ndvi_data = ds['NDVI'].values  # Extract NDVI data
    time_data = ds['time'].values  # Extract time data
    
    for i, time_step in enumerate(time_data):
        # Count the number of non-NaN values for the current time step
        non_nan_count = np.sum(~np.isnan(ndvi_data[i, ...]))
        
        # Convert timestamp to readable date format if necessary
        if isinstance(time_step, np.datetime64):
            time_step = str(time_step)
        
        # Store the results
        if time_step in results:
            results[time_step] += non_nan_count
        else:
            results[time_step] = non_nan_count

# Output the results
for time_step, non_nan_count in results.items():
    print(f"Date: {time_step} - Number of non-NaN NDVI values: {non_nan_count}")
Date: 2016-01-01T00:00:00.000000000 - Number of non-NaN NDVI values: 0
Date: 2016-01-06T00:00:00.000000000 - Number of non-NaN NDVI values: 0
Date: 2016-01-11T00:00:00.000000000 - Number of non-NaN NDVI values: 0
Date: 2016-01-16T00:00:00.000000000 - Number of non-NaN NDVI values: 0
Date: 2016-01-21T00:00:00.000000000 - Number of non-NaN NDVI values: 0
Date: 2016-01-26T00:00:00.000000000 - Number of non-NaN NDVI values: 0
Date: 2016-01-31T00:00:00.000000000 - Number of non-NaN NDVI values: 0
Date: 2016-02-05T00:00:00.000000000 - Number of non-NaN NDVI values: 0
Date: 2016-02-10T00:00:00.000000000 - Number of non-NaN NDVI values: 0
Date: 2016-02-15T00:00:00.000000000 - Number of non-NaN NDVI values: 0
Date: 2016-02-20T00:00:00.000000000 - Number of non-NaN NDVI values: 0
Date: 2016-02-25T00:00:00.000000000 - Number of non-NaN NDVI values: 0
Date: 2016-03-01T00:00:00.000000000 - Number of non-NaN NDVI values: 0
Date: 2016-03-06T00:00:00.000000000 - Number of non-NaN NDVI values: 0
Date: 2016-03-11T00:00:00.000000000 - Number of non-NaN NDVI values: 0
Date: 2016-03-16T00:00:00.000000000 - Number of non-NaN NDVI values: 0
Date: 2016-03-21T00:00:00.000000000 - Number of non-NaN NDVI values: 0
Date: 2016-03-26T00:00:00.000000000 - Number of non-NaN NDVI values: 0
Date: 2016-03-31T00:00:00.000000000 - Number of non-NaN NDVI values: 0
Date: 2016-04-05T00:00:00.000000000 - Number of non-NaN NDVI values: 0
Date: 2016-04-10T00:00:00.000000000 - Number of non-NaN NDVI values: 0
Date: 2016-04-15T00:00:00.000000000 - Number of non-NaN NDVI values: 0
Date: 2016-04-20T00:00:00.000000000 - Number of non-NaN NDVI values: 0
Date: 2016-04-25T00:00:00.000000000 - Number of non-NaN NDVI values: 0
Date: 2016-04-30T00:00:00.000000000 - Number of non-NaN NDVI values: 0
Date: 2016-05-05T00:00:00.000000000 - Number of non-NaN NDVI values: 0
Date: 2016-05-10T00:00:00.000000000 - Number of non-NaN NDVI values: 0
Date: 2016-05-15T00:00:00.000000000 - Number of non-NaN NDVI values: 0
Date: 2016-05-20T00:00:00.000000000 - Number of non-NaN NDVI values: 0
Date: 2016-05-25T00:00:00.000000000 - Number of non-NaN NDVI values: 0
Date: 2016-05-30T00:00:00.000000000 - Number of non-NaN NDVI values: 0
Date: 2016-06-04T00:00:00.000000000 - Number of non-NaN NDVI values: 0
Date: 2016-06-09T00:00:00.000000000 - Number of non-NaN NDVI values: 0
Date: 2016-06-14T00:00:00.000000000 - Number of non-NaN NDVI values: 0
Date: 2016-06-19T00:00:00.000000000 - Number of non-NaN NDVI values: 0
Date: 2016-06-24T00:00:00.000000000 - Number of non-NaN NDVI values: 0
Date: 2016-06-29T00:00:00.000000000 - Number of non-NaN NDVI values: 0
Date: 2016-07-04T00:00:00.000000000 - Number of non-NaN NDVI values: 0
Date: 2016-07-09T00:00:00.000000000 - Number of non-NaN NDVI values: 0
Date: 2016-07-14T00:00:00.000000000 - Number of non-NaN NDVI values: 0
Date: 2016-07-19T00:00:00.000000000 - Number of non-NaN NDVI values: 0
Date: 2016-07-24T00:00:00.000000000 - Number of non-NaN NDVI values: 0
Date: 2016-07-29T00:00:00.000000000 - Number of non-NaN NDVI values: 0
Date: 2016-08-03T00:00:00.000000000 - Number of non-NaN NDVI values: 0
Date: 2016-08-08T00:00:00.000000000 - Number of non-NaN NDVI values: 0
Date: 2016-08-13T00:00:00.000000000 - Number of non-NaN NDVI values: 0
Date: 2016-08-18T00:00:00.000000000 - Number of non-NaN NDVI values: 0
Date: 2016-08-23T00:00:00.000000000 - Number of non-NaN NDVI values: 0
Date: 2016-08-28T00:00:00.000000000 - Number of non-NaN NDVI values: 0
Date: 2016-09-02T00:00:00.000000000 - Number of non-NaN NDVI values: 0
Date: 2016-09-07T00:00:00.000000000 - Number of non-NaN NDVI values: 0
Date: 2016-09-12T00:00:00.000000000 - Number of non-NaN NDVI values: 0
Date: 2016-09-17T00:00:00.000000000 - Number of non-NaN NDVI values: 0
Date: 2016-09-22T00:00:00.000000000 - Number of non-NaN NDVI values: 0
Date: 2016-09-27T00:00:00.000000000 - Number of non-NaN NDVI values: 0
Date: 2016-10-02T00:00:00.000000000 - Number of non-NaN NDVI values: 0
Date: 2016-10-07T00:00:00.000000000 - Number of non-NaN NDVI values: 0
Date: 2016-10-12T00:00:00.000000000 - Number of non-NaN NDVI values: 0
Date: 2016-10-17T00:00:00.000000000 - Number of non-NaN NDVI values: 0
Date: 2016-10-22T00:00:00.000000000 - Number of non-NaN NDVI values: 0
Date: 2016-10-27T00:00:00.000000000 - Number of non-NaN NDVI values: 0
Date: 2016-11-01T00:00:00.000000000 - Number of non-NaN NDVI values: 735
Date: 2016-11-06T00:00:00.000000000 - Number of non-NaN NDVI values: 15236
Date: 2016-11-11T00:00:00.000000000 - Number of non-NaN NDVI values: 16074
Date: 2016-11-16T00:00:00.000000000 - Number of non-NaN NDVI values: 29670
Date: 2016-11-21T00:00:00.000000000 - Number of non-NaN NDVI values: 16149
Date: 2016-11-26T00:00:00.000000000 - Number of non-NaN NDVI values: 1212
Date: 2016-12-01T00:00:00.000000000 - Number of non-NaN NDVI values: 0
Date: 2016-12-06T00:00:00.000000000 - Number of non-NaN NDVI values: 29670
Date: 2016-12-11T00:00:00.000000000 - Number of non-NaN NDVI values: 63
Date: 2016-12-16T00:00:00.000000000 - Number of non-NaN NDVI values: 455
Date: 2016-12-21T00:00:00.000000000 - Number of non-NaN NDVI values: 0
Date: 2016-12-26T00:00:00.000000000 - Number of non-NaN NDVI values: 29644
Date: 2016-12-31T00:00:00.000000000 - Number of non-NaN NDVI values: 154887
Date: 2017-01-05T00:00:00.000000000 - Number of non-NaN NDVI values: 258414
Date: 2017-01-10T00:00:00.000000000 - Number of non-NaN NDVI values: 362279
Date: 2017-01-15T00:00:00.000000000 - Number of non-NaN NDVI values: 813709
Date: 2017-01-20T00:00:00.000000000 - Number of non-NaN NDVI values: 174809
Date: 2017-01-25T00:00:00.000000000 - Number of non-NaN NDVI values: 1231339
Date: 2017-01-30T00:00:00.000000000 - Number of non-NaN NDVI values: 638865
Date: 2017-02-04T00:00:00.000000000 - Number of non-NaN NDVI values: 599125
Date: 2017-02-09T00:00:00.000000000 - Number of non-NaN NDVI values: 441039
Date: 2017-02-14T00:00:00.000000000 - Number of non-NaN NDVI values: 650624
Date: 2017-02-19T00:00:00.000000000 - Number of non-NaN NDVI values: 485039
Date: 2017-02-24T00:00:00.000000000 - Number of non-NaN NDVI values: 553548
Date: 2017-03-01T00:00:00.000000000 - Number of non-NaN NDVI values: 290621
Date: 2017-03-06T00:00:00.000000000 - Number of non-NaN NDVI values: 707662
Date: 2017-03-11T00:00:00.000000000 - Number of non-NaN NDVI values: 617145
Date: 2017-03-16T00:00:00.000000000 - Number of non-NaN NDVI values: 601657
Date: 2017-03-21T00:00:00.000000000 - Number of non-NaN NDVI values: 297106
Date: 2017-03-26T00:00:00.000000000 - Number of non-NaN NDVI values: 512963
Date: 2017-03-31T00:00:00.000000000 - Number of non-NaN NDVI values: 460329
Date: 2017-04-05T00:00:00.000000000 - Number of non-NaN NDVI values: 641623
Date: 2017-04-10T00:00:00.000000000 - Number of non-NaN NDVI values: 186476
Date: 2017-04-15T00:00:00.000000000 - Number of non-NaN NDVI values: 864421
Date: 2017-04-20T00:00:00.000000000 - Number of non-NaN NDVI values: 539825
Date: 2017-04-25T00:00:00.000000000 - Number of non-NaN NDVI values: 332112
Date: 2017-04-30T00:00:00.000000000 - Number of non-NaN NDVI values: 413190
Date: 2017-05-05T00:00:00.000000000 - Number of non-NaN NDVI values: 643990
Date: 2017-05-10T00:00:00.000000000 - Number of non-NaN NDVI values: 527493
Date: 2017-05-15T00:00:00.000000000 - Number of non-NaN NDVI values: 668972
Date: 2017-05-20T00:00:00.000000000 - Number of non-NaN NDVI values: 418498
Date: 2017-05-25T00:00:00.000000000 - Number of non-NaN NDVI values: 668569
Date: 2017-05-30T00:00:00.000000000 - Number of non-NaN NDVI values: 440164
Date: 2017-06-04T00:00:00.000000000 - Number of non-NaN NDVI values: 838649
Date: 2017-06-09T00:00:00.000000000 - Number of non-NaN NDVI values: 400726
Date: 2017-06-14T00:00:00.000000000 - Number of non-NaN NDVI values: 802162
Date: 2017-06-19T00:00:00.000000000 - Number of non-NaN NDVI values: 576634
Date: 2017-06-24T00:00:00.000000000 - Number of non-NaN NDVI values: 1064392
Date: 2017-06-29T00:00:00.000000000 - Number of non-NaN NDVI values: 1396831
Date: 2017-07-04T00:00:00.000000000 - Number of non-NaN NDVI values: 904167
Date: 2017-07-09T00:00:00.000000000 - Number of non-NaN NDVI values: 1014362
Date: 2017-07-14T00:00:00.000000000 - Number of non-NaN NDVI values: 1235566
Date: 2017-07-19T00:00:00.000000000 - Number of non-NaN NDVI values: 1039258
Date: 2017-07-24T00:00:00.000000000 - Number of non-NaN NDVI values: 801631
Date: 2017-07-29T00:00:00.000000000 - Number of non-NaN NDVI values: 964584
Date: 2017-08-03T00:00:00.000000000 - Number of non-NaN NDVI values: 711300
Date: 2017-08-08T00:00:00.000000000 - Number of non-NaN NDVI values: 794347
Date: 2017-08-13T00:00:00.000000000 - Number of non-NaN NDVI values: 980194
Date: 2017-08-18T00:00:00.000000000 - Number of non-NaN NDVI values: 921815
Date: 2017-08-23T00:00:00.000000000 - Number of non-NaN NDVI values: 1200440
Date: 2017-08-28T00:00:00.000000000 - Number of non-NaN NDVI values: 625530
Date: 2017-09-02T00:00:00.000000000 - Number of non-NaN NDVI values: 874742
Date: 2017-09-07T00:00:00.000000000 - Number of non-NaN NDVI values: 1180835
Date: 2017-09-12T00:00:00.000000000 - Number of non-NaN NDVI values: 1023301
Date: 2017-09-17T00:00:00.000000000 - Number of non-NaN NDVI values: 265021
Date: 2017-09-22T00:00:00.000000000 - Number of non-NaN NDVI values: 560183
Date: 2017-09-27T00:00:00.000000000 - Number of non-NaN NDVI values: 971147
Date: 2017-10-02T00:00:00.000000000 - Number of non-NaN NDVI values: 995119
Date: 2017-10-07T00:00:00.000000000 - Number of non-NaN NDVI values: 1150902
Date: 2017-10-12T00:00:00.000000000 - Number of non-NaN NDVI values: 983103
Date: 2017-10-17T00:00:00.000000000 - Number of non-NaN NDVI values: 1193391
Date: 2017-10-22T00:00:00.000000000 - Number of non-NaN NDVI values: 1383536
Date: 2017-10-27T00:00:00.000000000 - Number of non-NaN NDVI values: 969612
Date: 2017-11-01T00:00:00.000000000 - Number of non-NaN NDVI values: 869122
Date: 2017-11-06T00:00:00.000000000 - Number of non-NaN NDVI values: 1040188
Date: 2017-11-11T00:00:00.000000000 - Number of non-NaN NDVI values: 1093029
Date: 2017-11-16T00:00:00.000000000 - Number of non-NaN NDVI values: 1307050
Date: 2017-11-21T00:00:00.000000000 - Number of non-NaN NDVI values: 842467
Date: 2017-11-26T00:00:00.000000000 - Number of non-NaN NDVI values: 1031091
Date: 2017-12-01T00:00:00.000000000 - Number of non-NaN NDVI values: 700918
Date: 2017-12-06T00:00:00.000000000 - Number of non-NaN NDVI values: 1223201
Date: 2017-12-11T00:00:00.000000000 - Number of non-NaN NDVI values: 1007335
Date: 2017-12-16T00:00:00.000000000 - Number of non-NaN NDVI values: 1116969
Date: 2017-12-21T00:00:00.000000000 - Number of non-NaN NDVI values: 895879
Date: 2017-12-26T00:00:00.000000000 - Number of non-NaN NDVI values: 1105935
Date: 2017-12-31T00:00:00.000000000 - Number of non-NaN NDVI values: 640445
Date: 2018-01-05T00:00:00.000000000 - Number of non-NaN NDVI values: 895603
Date: 2018-01-10T00:00:00.000000000 - Number of non-NaN NDVI values: 1127206
Date: 2018-01-15T00:00:00.000000000 - Number of non-NaN NDVI values: 838006
Date: 2018-01-20T00:00:00.000000000 - Number of non-NaN NDVI values: 1196178
Date: 2018-01-25T00:00:00.000000000 - Number of non-NaN NDVI values: 1099987
Date: 2018-01-30T00:00:00.000000000 - Number of non-NaN NDVI values: 855080
Date: 2018-02-04T00:00:00.000000000 - Number of non-NaN NDVI values: 1167105
Date: 2018-02-09T00:00:00.000000000 - Number of non-NaN NDVI values: 1214804
Date: 2018-02-14T00:00:00.000000000 - Number of non-NaN NDVI values: 1003651
Date: 2018-02-19T00:00:00.000000000 - Number of non-NaN NDVI values: 1036841
Date: 2018-02-24T00:00:00.000000000 - Number of non-NaN NDVI values: 1088853
Date: 2018-03-01T00:00:00.000000000 - Number of non-NaN NDVI values: 1247952
Date: 2018-03-06T00:00:00.000000000 - Number of non-NaN NDVI values: 1067825
Date: 2018-03-11T00:00:00.000000000 - Number of non-NaN NDVI values: 1000610
Date: 2018-03-16T00:00:00.000000000 - Number of non-NaN NDVI values: 1011396
Date: 2018-03-21T00:00:00.000000000 - Number of non-NaN NDVI values: 753163
Date: 2018-03-26T00:00:00.000000000 - Number of non-NaN NDVI values: 1265163
Date: 2018-03-31T00:00:00.000000000 - Number of non-NaN NDVI values: 1265194
Date: 2018-04-05T00:00:00.000000000 - Number of non-NaN NDVI values: 726434
Date: 2018-04-10T00:00:00.000000000 - Number of non-NaN NDVI values: 1166807
Date: 2018-04-15T00:00:00.000000000 - Number of non-NaN NDVI values: 1033506
Date: 2018-04-20T00:00:00.000000000 - Number of non-NaN NDVI values: 1250892
Date: 2018-04-25T00:00:00.000000000 - Number of non-NaN NDVI values: 1450697
Date: 2018-04-30T00:00:00.000000000 - Number of non-NaN NDVI values: 1147316
Date: 2018-05-05T00:00:00.000000000 - Number of non-NaN NDVI values: 1359881
Date: 2018-05-10T00:00:00.000000000 - Number of non-NaN NDVI values: 1450309
Date: 2018-05-15T00:00:00.000000000 - Number of non-NaN NDVI values: 1416305
Date: 2018-05-20T00:00:00.000000000 - Number of non-NaN NDVI values: 1326400
Date: 2018-05-25T00:00:00.000000000 - Number of non-NaN NDVI values: 1309329
Date: 2018-05-30T00:00:00.000000000 - Number of non-NaN NDVI values: 1450126
Date: 2018-06-04T00:00:00.000000000 - Number of non-NaN NDVI values: 1436022
Date: 2018-06-09T00:00:00.000000000 - Number of non-NaN NDVI values: 1461019
Date: 2018-06-14T00:00:00.000000000 - Number of non-NaN NDVI values: 1317819
Date: 2018-06-19T00:00:00.000000000 - Number of non-NaN NDVI values: 1449900
Date: 2018-06-24T00:00:00.000000000 - Number of non-NaN NDVI values: 1460091
Date: 2018-06-29T00:00:00.000000000 - Number of non-NaN NDVI values: 1351880
Date: 2018-07-04T00:00:00.000000000 - Number of non-NaN NDVI values: 1167428
Date: 2018-07-09T00:00:00.000000000 - Number of non-NaN NDVI values: 1209222
Date: 2018-07-14T00:00:00.000000000 - Number of non-NaN NDVI values: 1065911
Date: 2018-07-19T00:00:00.000000000 - Number of non-NaN NDVI values: 1393458
Date: 2018-07-24T00:00:00.000000000 - Number of non-NaN NDVI values: 1293343
Date: 2018-07-29T00:00:00.000000000 - Number of non-NaN NDVI values: 1304142
Date: 2018-08-03T00:00:00.000000000 - Number of non-NaN NDVI values: 1403274
Date: 2018-08-08T00:00:00.000000000 - Number of non-NaN NDVI values: 1253138
Date: 2018-08-13T00:00:00.000000000 - Number of non-NaN NDVI values: 1267069
Date: 2018-08-18T00:00:00.000000000 - Number of non-NaN NDVI values: 1309457
Date: 2018-08-23T00:00:00.000000000 - Number of non-NaN NDVI values: 1184789
Date: 2018-08-28T00:00:00.000000000 - Number of non-NaN NDVI values: 1175814
Date: 2018-09-02T00:00:00.000000000 - Number of non-NaN NDVI values: 1124725
Date: 2018-09-07T00:00:00.000000000 - Number of non-NaN NDVI values: 1296429
Date: 2018-09-12T00:00:00.000000000 - Number of non-NaN NDVI values: 1467258
Date: 2018-09-17T00:00:00.000000000 - Number of non-NaN NDVI values: 1264949
Date: 2018-09-22T00:00:00.000000000 - Number of non-NaN NDVI values: 1331015
Date: 2018-09-27T00:00:00.000000000 - Number of non-NaN NDVI values: 1311157
Date: 2018-10-02T00:00:00.000000000 - Number of non-NaN NDVI values: 877577
Date: 2018-10-07T00:00:00.000000000 - Number of non-NaN NDVI values: 842026
Date: 2018-10-12T00:00:00.000000000 - Number of non-NaN NDVI values: 946839
Date: 2018-10-17T00:00:00.000000000 - Number of non-NaN NDVI values: 1086916
Date: 2018-10-22T00:00:00.000000000 - Number of non-NaN NDVI values: 1290187
Date: 2018-10-27T00:00:00.000000000 - Number of non-NaN NDVI values: 1201566
Date: 2018-11-01T00:00:00.000000000 - Number of non-NaN NDVI values: 1211751
Date: 2018-11-06T00:00:00.000000000 - Number of non-NaN NDVI values: 1391392
Date: 2018-11-11T00:00:00.000000000 - Number of non-NaN NDVI values: 1180698
Date: 2018-11-16T00:00:00.000000000 - Number of non-NaN NDVI values: 1206150
Date: 2018-11-21T00:00:00.000000000 - Number of non-NaN NDVI values: 1219878
Date: 2018-11-26T00:00:00.000000000 - Number of non-NaN NDVI values: 997135
Date: 2018-12-01T00:00:00.000000000 - Number of non-NaN NDVI values: 913512
Date: 2018-12-06T00:00:00.000000000 - Number of non-NaN NDVI values: 1109780
Date: 2018-12-11T00:00:00.000000000 - Number of non-NaN NDVI values: 1284869
Date: 2018-12-16T00:00:00.000000000 - Number of non-NaN NDVI values: 1276355
Date: 2018-12-21T00:00:00.000000000 - Number of non-NaN NDVI values: 842122
Date: 2018-12-26T00:00:00.000000000 - Number of non-NaN NDVI values: 1041945
Date: 2018-12-31T00:00:00.000000000 - Number of non-NaN NDVI values: 1402546
Date: 2019-01-05T00:00:00.000000000 - Number of non-NaN NDVI values: 766200
Date: 2019-01-10T00:00:00.000000000 - Number of non-NaN NDVI values: 572778
Date: 2019-01-15T00:00:00.000000000 - Number of non-NaN NDVI values: 1007771
Date: 2019-01-20T00:00:00.000000000 - Number of non-NaN NDVI values: 1217585
Date: 2019-01-25T00:00:00.000000000 - Number of non-NaN NDVI values: 1107193
Date: 2019-01-30T00:00:00.000000000 - Number of non-NaN NDVI values: 1194286
Date: 2019-02-04T00:00:00.000000000 - Number of non-NaN NDVI values: 989976
Date: 2019-02-09T00:00:00.000000000 - Number of non-NaN NDVI values: 998637
Date: 2019-02-14T00:00:00.000000000 - Number of non-NaN NDVI values: 489968
Date: 2019-02-19T00:00:00.000000000 - Number of non-NaN NDVI values: 709547
Date: 2019-02-24T00:00:00.000000000 - Number of non-NaN NDVI values: 900770
Date: 2019-03-01T00:00:00.000000000 - Number of non-NaN NDVI values: 729155
Date: 2019-03-06T00:00:00.000000000 - Number of non-NaN NDVI values: 824081
Date: 2019-03-11T00:00:00.000000000 - Number of non-NaN NDVI values: 971608
Date: 2019-03-16T00:00:00.000000000 - Number of non-NaN NDVI values: 1388846
Date: 2019-03-21T00:00:00.000000000 - Number of non-NaN NDVI values: 809416
Date: 2019-03-26T00:00:00.000000000 - Number of non-NaN NDVI values: 1224888
Date: 2019-03-31T00:00:00.000000000 - Number of non-NaN NDVI values: 1188069
Date: 2019-04-05T00:00:00.000000000 - Number of non-NaN NDVI values: 1323358
Date: 2019-04-10T00:00:00.000000000 - Number of non-NaN NDVI values: 1162629
Date: 2019-04-15T00:00:00.000000000 - Number of non-NaN NDVI values: 1192496
Date: 2019-04-20T00:00:00.000000000 - Number of non-NaN NDVI values: 1152134
Date: 2019-04-25T00:00:00.000000000 - Number of non-NaN NDVI values: 1308515
Date: 2019-04-30T00:00:00.000000000 - Number of non-NaN NDVI values: 1377644
Date: 2019-05-05T00:00:00.000000000 - Number of non-NaN NDVI values: 1063192
Date: 2019-05-10T00:00:00.000000000 - Number of non-NaN NDVI values: 1331953
Date: 2019-05-15T00:00:00.000000000 - Number of non-NaN NDVI values: 909097
Date: 2019-05-20T00:00:00.000000000 - Number of non-NaN NDVI values: 896371
Date: 2019-05-25T00:00:00.000000000 - Number of non-NaN NDVI values: 1216011
Date: 2019-05-30T00:00:00.000000000 - Number of non-NaN NDVI values: 1294520
Date: 2019-06-04T00:00:00.000000000 - Number of non-NaN NDVI values: 1499912
Date: 2019-06-09T00:00:00.000000000 - Number of non-NaN NDVI values: 1300057
Date: 2019-06-14T00:00:00.000000000 - Number of non-NaN NDVI values: 1476098
Date: 2019-06-19T00:00:00.000000000 - Number of non-NaN NDVI values: 1486866
Date: 2019-06-24T00:00:00.000000000 - Number of non-NaN NDVI values: 1466269
Date: 2019-06-29T00:00:00.000000000 - Number of non-NaN NDVI values: 1393771
Date: 2019-07-04T00:00:00.000000000 - Number of non-NaN NDVI values: 1408219
Date: 2019-07-09T00:00:00.000000000 - Number of non-NaN NDVI values: 1527057
Date: 2019-07-14T00:00:00.000000000 - Number of non-NaN NDVI values: 1275057
Date: 2019-07-19T00:00:00.000000000 - Number of non-NaN NDVI values: 1279632
Date: 2019-07-24T00:00:00.000000000 - Number of non-NaN NDVI values: 1302231
Date: 2019-07-29T00:00:00.000000000 - Number of non-NaN NDVI values: 1304670
Date: 2019-08-03T00:00:00.000000000 - Number of non-NaN NDVI values: 1193378
Date: 2019-08-08T00:00:00.000000000 - Number of non-NaN NDVI values: 1208343
Date: 2019-08-13T00:00:00.000000000 - Number of non-NaN NDVI values: 1463950
Date: 2019-08-18T00:00:00.000000000 - Number of non-NaN NDVI values: 1440332
Date: 2019-08-23T00:00:00.000000000 - Number of non-NaN NDVI values: 1417718
Date: 2019-08-28T00:00:00.000000000 - Number of non-NaN NDVI values: 1319719
Date: 2019-09-02T00:00:00.000000000 - Number of non-NaN NDVI values: 1367711
Date: 2019-09-07T00:00:00.000000000 - Number of non-NaN NDVI values: 1224311
Date: 2019-09-12T00:00:00.000000000 - Number of non-NaN NDVI values: 1270193
Date: 2019-09-17T00:00:00.000000000 - Number of non-NaN NDVI values: 1275820
Date: 2019-09-22T00:00:00.000000000 - Number of non-NaN NDVI values: 1532260
Date: 2019-09-27T00:00:00.000000000 - Number of non-NaN NDVI values: 1266551
Date: 2019-10-02T00:00:00.000000000 - Number of non-NaN NDVI values: 1370441
Date: 2019-10-07T00:00:00.000000000 - Number of non-NaN NDVI values: 1414227
Date: 2019-10-12T00:00:00.000000000 - Number of non-NaN NDVI values: 1533796
Date: 2019-10-17T00:00:00.000000000 - Number of non-NaN NDVI values: 1475055
Date: 2019-10-22T00:00:00.000000000 - Number of non-NaN NDVI values: 1372160
Date: 2019-10-27T00:00:00.000000000 - Number of non-NaN NDVI values: 1337412
Date: 2019-11-01T00:00:00.000000000 - Number of non-NaN NDVI values: 1430690
Date: 2019-11-06T00:00:00.000000000 - Number of non-NaN NDVI values: 1202601
Date: 2019-11-11T00:00:00.000000000 - Number of non-NaN NDVI values: 1178391
Date: 2019-11-16T00:00:00.000000000 - Number of non-NaN NDVI values: 1068666
Date: 2019-11-21T00:00:00.000000000 - Number of non-NaN NDVI values: 1107350
Date: 2019-11-26T00:00:00.000000000 - Number of non-NaN NDVI values: 626652
Date: 2019-12-01T00:00:00.000000000 - Number of non-NaN NDVI values: 1035307
Date: 2019-12-06T00:00:00.000000000 - Number of non-NaN NDVI values: 839038
Date: 2019-12-11T00:00:00.000000000 - Number of non-NaN NDVI values: 1140745
Date: 2019-12-16T00:00:00.000000000 - Number of non-NaN NDVI values: 1264021
Date: 2019-12-21T00:00:00.000000000 - Number of non-NaN NDVI values: 954351
Date: 2019-12-26T00:00:00.000000000 - Number of non-NaN NDVI values: 1213662
Date: 2019-12-31T00:00:00.000000000 - Number of non-NaN NDVI values: 1039856
Date: 2020-01-05T00:00:00.000000000 - Number of non-NaN NDVI values: 1144339
Date: 2020-01-10T00:00:00.000000000 - Number of non-NaN NDVI values: 1292090
Date: 2020-01-15T00:00:00.000000000 - Number of non-NaN NDVI values: 910265
Date: 2020-01-20T00:00:00.000000000 - Number of non-NaN NDVI values: 1120426
Date: 2020-01-25T00:00:00.000000000 - Number of non-NaN NDVI values: 994905
Date: 2020-01-30T00:00:00.000000000 - Number of non-NaN NDVI values: 1256509
Date: 2020-02-04T00:00:00.000000000 - Number of non-NaN NDVI values: 1197279
Date: 2020-02-09T00:00:00.000000000 - Number of non-NaN NDVI values: 1106574
Date: 2020-02-14T00:00:00.000000000 - Number of non-NaN NDVI values: 1169250
Date: 2020-02-19T00:00:00.000000000 - Number of non-NaN NDVI values: 854661
Date: 2020-02-24T00:00:00.000000000 - Number of non-NaN NDVI values: 1367902
Date: 2020-02-29T00:00:00.000000000 - Number of non-NaN NDVI values: 1191210
Date: 2020-03-05T00:00:00.000000000 - Number of non-NaN NDVI values: 936542
Date: 2020-03-10T00:00:00.000000000 - Number of non-NaN NDVI values: 664485
Date: 2020-03-15T00:00:00.000000000 - Number of non-NaN NDVI values: 638823
Date: 2020-03-20T00:00:00.000000000 - Number of non-NaN NDVI values: 893348
Date: 2020-03-25T00:00:00.000000000 - Number of non-NaN NDVI values: 722596
Date: 2020-03-30T00:00:00.000000000 - Number of non-NaN NDVI values: 1304162
Date: 2020-04-04T00:00:00.000000000 - Number of non-NaN NDVI values: 1150964
Date: 2020-04-09T00:00:00.000000000 - Number of non-NaN NDVI values: 830741
Date: 2020-04-14T00:00:00.000000000 - Number of non-NaN NDVI values: 884877
Date: 2020-04-19T00:00:00.000000000 - Number of non-NaN NDVI values: 1015297
Date: 2020-04-24T00:00:00.000000000 - Number of non-NaN NDVI values: 1298293
Date: 2020-04-29T00:00:00.000000000 - Number of non-NaN NDVI values: 1476600
Date: 2020-05-04T00:00:00.000000000 - Number of non-NaN NDVI values: 1377043
Date: 2020-05-09T00:00:00.000000000 - Number of non-NaN NDVI values: 1313016
Date: 2020-05-14T00:00:00.000000000 - Number of non-NaN NDVI values: 1215302
Date: 2020-05-19T00:00:00.000000000 - Number of non-NaN NDVI values: 1430985
Date: 2020-05-24T00:00:00.000000000 - Number of non-NaN NDVI values: 1420325
Date: 2020-05-29T00:00:00.000000000 - Number of non-NaN NDVI values: 1123239
Date: 2020-06-03T00:00:00.000000000 - Number of non-NaN NDVI values: 1265310
Date: 2020-06-08T00:00:00.000000000 - Number of non-NaN NDVI values: 1492647
Date: 2020-06-13T00:00:00.000000000 - Number of non-NaN NDVI values: 1310120
Date: 2020-06-18T00:00:00.000000000 - Number of non-NaN NDVI values: 1530476
Date: 2020-06-23T00:00:00.000000000 - Number of non-NaN NDVI values: 1320043
Date: 2020-06-28T00:00:00.000000000 - Number of non-NaN NDVI values: 1396954
Date: 2020-07-03T00:00:00.000000000 - Number of non-NaN NDVI values: 1350658
Date: 2020-07-08T00:00:00.000000000 - Number of non-NaN NDVI values: 1376557
Date: 2020-07-13T00:00:00.000000000 - Number of non-NaN NDVI values: 1220621
Date: 2020-07-18T00:00:00.000000000 - Number of non-NaN NDVI values: 1125643
Date: 2020-07-23T00:00:00.000000000 - Number of non-NaN NDVI values: 953489
Date: 2020-07-28T00:00:00.000000000 - Number of non-NaN NDVI values: 1419714
Date: 2020-08-02T00:00:00.000000000 - Number of non-NaN NDVI values: 1421451
Date: 2020-08-07T00:00:00.000000000 - Number of non-NaN NDVI values: 1328570
Date: 2020-08-12T00:00:00.000000000 - Number of non-NaN NDVI values: 1309194
Date: 2020-08-17T00:00:00.000000000 - Number of non-NaN NDVI values: 1359476
Date: 2020-08-22T00:00:00.000000000 - Number of non-NaN NDVI values: 1393558
Date: 2020-08-27T00:00:00.000000000 - Number of non-NaN NDVI values: 1336536
Date: 2020-09-01T00:00:00.000000000 - Number of non-NaN NDVI values: 1426648
Date: 2020-09-06T00:00:00.000000000 - Number of non-NaN NDVI values: 1178498
Date: 2020-09-11T00:00:00.000000000 - Number of non-NaN NDVI values: 1284034
Date: 2020-09-16T00:00:00.000000000 - Number of non-NaN NDVI values: 1338585
Date: 2020-09-21T00:00:00.000000000 - Number of non-NaN NDVI values: 1427975
Date: 2020-09-26T00:00:00.000000000 - Number of non-NaN NDVI values: 1398051
Date: 2020-10-01T00:00:00.000000000 - Number of non-NaN NDVI values: 1322598
Date: 2020-10-06T00:00:00.000000000 - Number of non-NaN NDVI values: 1301989
Date: 2020-10-11T00:00:00.000000000 - Number of non-NaN NDVI values: 1450704
Date: 2020-10-16T00:00:00.000000000 - Number of non-NaN NDVI values: 1411401
Date: 2020-10-21T00:00:00.000000000 - Number of non-NaN NDVI values: 1209509
Date: 2020-10-26T00:00:00.000000000 - Number of non-NaN NDVI values: 1298224
Date: 2020-10-31T00:00:00.000000000 - Number of non-NaN NDVI values: 1208211
Date: 2020-11-05T00:00:00.000000000 - Number of non-NaN NDVI values: 1077463
Date: 2020-11-10T00:00:00.000000000 - Number of non-NaN NDVI values: 1332641
Date: 2020-11-15T00:00:00.000000000 - Number of non-NaN NDVI values: 1238394
Date: 2020-11-20T00:00:00.000000000 - Number of non-NaN NDVI values: 976203
Date: 2020-11-25T00:00:00.000000000 - Number of non-NaN NDVI values: 1442804
Date: 2020-11-30T00:00:00.000000000 - Number of non-NaN NDVI values: 1468960
Date: 2020-12-05T00:00:00.000000000 - Number of non-NaN NDVI values: 1291332
Date: 2020-12-10T00:00:00.000000000 - Number of non-NaN NDVI values: 783388
Date: 2020-12-15T00:00:00.000000000 - Number of non-NaN NDVI values: 1309828
Date: 2020-12-20T00:00:00.000000000 - Number of non-NaN NDVI values: 1267438
Date: 2020-12-25T00:00:00.000000000 - Number of non-NaN NDVI values: 1025053
Date: 2020-12-30T00:00:00.000000000 - Number of non-NaN NDVI values: 1428678
Date: 2021-01-04T00:00:00.000000000 - Number of non-NaN NDVI values: 1354112
Date: 2021-01-09T00:00:00.000000000 - Number of non-NaN NDVI values: 1294689
Date: 2021-01-14T00:00:00.000000000 - Number of non-NaN NDVI values: 1265808
Date: 2021-01-19T00:00:00.000000000 - Number of non-NaN NDVI values: 1047103
Date: 2021-01-24T00:00:00.000000000 - Number of non-NaN NDVI values: 849325
Date: 2021-01-29T00:00:00.000000000 - Number of non-NaN NDVI values: 1155481
Date: 2021-02-03T00:00:00.000000000 - Number of non-NaN NDVI values: 1314973
Date: 2021-02-08T00:00:00.000000000 - Number of non-NaN NDVI values: 1222713
Date: 2021-02-13T00:00:00.000000000 - Number of non-NaN NDVI values: 655904
Date: 2021-02-18T00:00:00.000000000 - Number of non-NaN NDVI values: 1278593
Date: 2021-02-23T00:00:00.000000000 - Number of non-NaN NDVI values: 1235474
Date: 2021-02-28T00:00:00.000000000 - Number of non-NaN NDVI values: 1326062
Date: 2021-03-05T00:00:00.000000000 - Number of non-NaN NDVI values: 1105634
Date: 2021-03-10T00:00:00.000000000 - Number of non-NaN NDVI values: 1051033
Date: 2021-03-15T00:00:00.000000000 - Number of non-NaN NDVI values: 1129931
Date: 2021-03-20T00:00:00.000000000 - Number of non-NaN NDVI values: 810931
Date: 2021-03-25T00:00:00.000000000 - Number of non-NaN NDVI values: 1385991
Date: 2021-03-30T00:00:00.000000000 - Number of non-NaN NDVI values: 1366627
Date: 2021-04-04T00:00:00.000000000 - Number of non-NaN NDVI values: 1347199
Date: 2021-04-09T00:00:00.000000000 - Number of non-NaN NDVI values: 1272275
Date: 2021-04-14T00:00:00.000000000 - Number of non-NaN NDVI values: 1160426
Date: 2021-04-19T00:00:00.000000000 - Number of non-NaN NDVI values: 1100122
Date: 2021-04-24T00:00:00.000000000 - Number of non-NaN NDVI values: 1220997
Date: 2021-04-29T00:00:00.000000000 - Number of non-NaN NDVI values: 1184735
Date: 2021-05-04T00:00:00.000000000 - Number of non-NaN NDVI values: 1507479
Date: 2021-05-09T00:00:00.000000000 - Number of non-NaN NDVI values: 1369440
Date: 2021-05-14T00:00:00.000000000 - Number of non-NaN NDVI values: 1298532
Date: 2021-05-19T00:00:00.000000000 - Number of non-NaN NDVI values: 1284081
Date: 2021-05-24T00:00:00.000000000 - Number of non-NaN NDVI values: 1475327
Date: 2021-05-29T00:00:00.000000000 - Number of non-NaN NDVI values: 1242700
Date: 2021-06-03T00:00:00.000000000 - Number of non-NaN NDVI values: 1249449
Date: 2021-06-08T00:00:00.000000000 - Number of non-NaN NDVI values: 1468316
Date: 2021-06-13T00:00:00.000000000 - Number of non-NaN NDVI values: 1488158
Date: 2021-06-18T00:00:00.000000000 - Number of non-NaN NDVI values: 1420875
Date: 2021-06-23T00:00:00.000000000 - Number of non-NaN NDVI values: 938676
Date: 2021-06-28T00:00:00.000000000 - Number of non-NaN NDVI values: 1108215
Date: 2021-07-03T00:00:00.000000000 - Number of non-NaN NDVI values: 1325758
Date: 2021-07-08T00:00:00.000000000 - Number of non-NaN NDVI values: 1397584
Date: 2021-07-13T00:00:00.000000000 - Number of non-NaN NDVI values: 1312019
Date: 2021-07-18T00:00:00.000000000 - Number of non-NaN NDVI values: 1160595
Date: 2021-07-23T00:00:00.000000000 - Number of non-NaN NDVI values: 1072330
Date: 2021-07-28T00:00:00.000000000 - Number of non-NaN NDVI values: 1233585
Date: 2021-08-02T00:00:00.000000000 - Number of non-NaN NDVI values: 1425644
Date: 2021-08-07T00:00:00.000000000 - Number of non-NaN NDVI values: 1189717
Date: 2021-08-12T00:00:00.000000000 - Number of non-NaN NDVI values: 1130541
Date: 2021-08-17T00:00:00.000000000 - Number of non-NaN NDVI values: 1244330
Date: 2021-08-22T00:00:00.000000000 - Number of non-NaN NDVI values: 1429885
Date: 2021-08-27T00:00:00.000000000 - Number of non-NaN NDVI values: 1400988
Date: 2021-09-01T00:00:00.000000000 - Number of non-NaN NDVI values: 1235652
Date: 2021-09-06T00:00:00.000000000 - Number of non-NaN NDVI values: 1490848
Date: 2021-09-11T00:00:00.000000000 - Number of non-NaN NDVI values: 1309929
Date: 2021-09-16T00:00:00.000000000 - Number of non-NaN NDVI values: 1493453
Date: 2021-09-21T00:00:00.000000000 - Number of non-NaN NDVI values: 1352477
Date: 2021-09-26T00:00:00.000000000 - Number of non-NaN NDVI values: 850782
Date: 2021-10-01T00:00:00.000000000 - Number of non-NaN NDVI values: 1189767
Date: 2021-10-06T00:00:00.000000000 - Number of non-NaN NDVI values: 946703
Date: 2021-10-11T00:00:00.000000000 - Number of non-NaN NDVI values: 1308464
Date: 2021-10-16T00:00:00.000000000 - Number of non-NaN NDVI values: 1324177
Date: 2021-10-21T00:00:00.000000000 - Number of non-NaN NDVI values: 840014
Date: 2021-10-26T00:00:00.000000000 - Number of non-NaN NDVI values: 1342931
Date: 2021-10-31T00:00:00.000000000 - Number of non-NaN NDVI values: 1108334
Date: 2021-11-05T00:00:00.000000000 - Number of non-NaN NDVI values: 1184649
Date: 2021-11-10T00:00:00.000000000 - Number of non-NaN NDVI values: 1323340
Date: 2021-11-15T00:00:00.000000000 - Number of non-NaN NDVI values: 1204549
Date: 2021-11-20T00:00:00.000000000 - Number of non-NaN NDVI values: 704829
Date: 2021-11-25T00:00:00.000000000 - Number of non-NaN NDVI values: 1371307
Date: 2021-11-30T00:00:00.000000000 - Number of non-NaN NDVI values: 1298220
Date: 2021-12-05T00:00:00.000000000 - Number of non-NaN NDVI values: 1085287
Date: 2021-12-10T00:00:00.000000000 - Number of non-NaN NDVI values: 1110481
Date: 2021-12-15T00:00:00.000000000 - Number of non-NaN NDVI values: 1208832
Date: 2021-12-20T00:00:00.000000000 - Number of non-NaN NDVI values: 904836
Date: 2021-12-25T00:00:00.000000000 - Number of non-NaN NDVI values: 574242
Date: 2021-12-30T00:00:00.000000000 - Number of non-NaN NDVI values: 1169220
Date: 2022-01-04T00:00:00.000000000 - Number of non-NaN NDVI values: 1189770
Date: 2022-01-09T00:00:00.000000000 - Number of non-NaN NDVI values: 1309437
Date: 2022-01-14T00:00:00.000000000 - Number of non-NaN NDVI values: 1278740
Date: 2022-01-19T00:00:00.000000000 - Number of non-NaN NDVI values: 1296185
Date: 2022-01-24T00:00:00.000000000 - Number of non-NaN NDVI values: 1404603
Date: 2022-01-29T00:00:00.000000000 - Number of non-NaN NDVI values: 1021273
Date: 2022-02-03T00:00:00.000000000 - Number of non-NaN NDVI values: 1339798
Date: 2022-02-08T00:00:00.000000000 - Number of non-NaN NDVI values: 1361600
Date: 2022-02-13T00:00:00.000000000 - Number of non-NaN NDVI values: 1395969
Date: 2022-02-18T00:00:00.000000000 - Number of non-NaN NDVI values: 1226100
Date: 2022-02-23T00:00:00.000000000 - Number of non-NaN NDVI values: 1288531
Date: 2022-02-28T00:00:00.000000000 - Number of non-NaN NDVI values: 1384529
Date: 2022-03-05T00:00:00.000000000 - Number of non-NaN NDVI values: 1035053
Date: 2022-03-10T00:00:00.000000000 - Number of non-NaN NDVI values: 1245686
Date: 2022-03-15T00:00:00.000000000 - Number of non-NaN NDVI values: 1231153
Date: 2022-03-20T00:00:00.000000000 - Number of non-NaN NDVI values: 1341335
Date: 2022-03-25T00:00:00.000000000 - Number of non-NaN NDVI values: 1199186
Date: 2022-03-30T00:00:00.000000000 - Number of non-NaN NDVI values: 1186419
Date: 2022-04-04T00:00:00.000000000 - Number of non-NaN NDVI values: 1534061
Date: 2022-04-09T00:00:00.000000000 - Number of non-NaN NDVI values: 1214160
Date: 2022-04-14T00:00:00.000000000 - Number of non-NaN NDVI values: 1230675
Date: 2022-04-19T00:00:00.000000000 - Number of non-NaN NDVI values: 1014514
Date: 2022-04-24T00:00:00.000000000 - Number of non-NaN NDVI values: 1256100
Date: 2022-04-29T00:00:00.000000000 - Number of non-NaN NDVI values: 1460838
Date: 2022-05-04T00:00:00.000000000 - Number of non-NaN NDVI values: 1197087
Date: 2022-05-09T00:00:00.000000000 - Number of non-NaN NDVI values: 1476932
Date: 2022-05-14T00:00:00.000000000 - Number of non-NaN NDVI values: 1490087
Date: 2022-05-19T00:00:00.000000000 - Number of non-NaN NDVI values: 1262806
Date: 2022-05-24T00:00:00.000000000 - Number of non-NaN NDVI values: 1448367
Date: 2022-05-29T00:00:00.000000000 - Number of non-NaN NDVI values: 1476842
Date: 2022-06-03T00:00:00.000000000 - Number of non-NaN NDVI values: 1192456
Date: 2022-06-08T00:00:00.000000000 - Number of non-NaN NDVI values: 1402978
Date: 2022-06-13T00:00:00.000000000 - Number of non-NaN NDVI values: 1453343
Date: 2022-06-18T00:00:00.000000000 - Number of non-NaN NDVI values: 1217817
Date: 2022-06-23T00:00:00.000000000 - Number of non-NaN NDVI values: 1181398
Date: 2022-06-28T00:00:00.000000000 - Number of non-NaN NDVI values: 1409386
Date: 2022-07-03T00:00:00.000000000 - Number of non-NaN NDVI values: 1307337
Date: 2022-07-08T00:00:00.000000000 - Number of non-NaN NDVI values: 1471812
Date: 2022-07-13T00:00:00.000000000 - Number of non-NaN NDVI values: 1392438
Date: 2022-07-18T00:00:00.000000000 - Number of non-NaN NDVI values: 1367636
Date: 2022-07-23T00:00:00.000000000 - Number of non-NaN NDVI values: 1298967
Date: 2022-07-28T00:00:00.000000000 - Number of non-NaN NDVI values: 975686
Date: 2022-08-02T00:00:00.000000000 - Number of non-NaN NDVI values: 1227866
Date: 2022-08-07T00:00:00.000000000 - Number of non-NaN NDVI values: 1220408
Date: 2022-08-12T00:00:00.000000000 - Number of non-NaN NDVI values: 1253823
Date: 2022-08-17T00:00:00.000000000 - Number of non-NaN NDVI values: 971412
Date: 2022-08-22T00:00:00.000000000 - Number of non-NaN NDVI values: 1262454
Date: 2022-08-27T00:00:00.000000000 - Number of non-NaN NDVI values: 1391227
Date: 2022-09-01T00:00:00.000000000 - Number of non-NaN NDVI values: 1434055
Date: 2022-09-06T00:00:00.000000000 - Number of non-NaN NDVI values: 1347668
Date: 2022-09-11T00:00:00.000000000 - Number of non-NaN NDVI values: 1164327
Date: 2022-09-16T00:00:00.000000000 - Number of non-NaN NDVI values: 1358083
Date: 2022-09-21T00:00:00.000000000 - Number of non-NaN NDVI values: 1160230
Date: 2022-09-26T00:00:00.000000000 - Number of non-NaN NDVI values: 1369278
Date: 2022-10-01T00:00:00.000000000 - Number of non-NaN NDVI values: 908823
Date: 2022-10-06T00:00:00.000000000 - Number of non-NaN NDVI values: 954979

Additionally we plot the date of the first non-nan value occurence of each pixel overall cubes. Most pixel’s first non-nan value occurence is on the 2017-01-15. And the latest occurence is on the 2017-05-05. This might motivate us to use the 2017-05-05 as starting value for our training data time series.

# List to store the first non-NaN time indices for all pixels
first_non_nan_indices = []

# Loop over all NetCDF files
for nc_file in nc_files:
    nc_path = os.path.join(file_path, nc_file)
    
    # Load the current NetCDF file
    data = xr.open_dataset(nc_path)
    ndvi_data = data['NDVI'].values
    
    # Get the size of the data
    time_len, x_len, y_len = ndvi_data.shape
    
    # Find the first non-NaN time index for each pixel
    for x in range(x_len):
        for y in range(y_len):
            first_non_nan_time_index = np.nan
            for t in range(time_len):
                if not np.isnan(ndvi_data[t, x, y]):
                    first_non_nan_time_index = t
                    break
            first_non_nan_indices.append(first_non_nan_time_index)

# Convert time indices to date values
start_date = pd.Timestamp('2016-01-01')
dates = [start_date + pd.Timedelta(days=int(index) * 5) for index in first_non_nan_indices if not np.isnan(index)]

# Count occurrences of each date
date_counts = pd.Series(dates).value_counts().sort_index()

# Plot the distribution of first non-NaN dates as a bar plot
plt.figure(figsize=(12, 6))
plt.bar(date_counts.index, date_counts.values, edgecolor='black', alpha=0.7)
plt.title('Distribution of First Non-NaN NDVI Dates for All Pixels')
plt.xlabel('Date')
plt.ylabel('Count of Pixels')
plt.grid(True)

plt.xticks(rotation=45)  # Rotate date labels for better readability
plt.tight_layout()
plt.show()

# Calculate the median of the date values
median_date_num = np.median([date.toordinal() for date in dates])
median_date = pd.Timestamp.fromordinal(int(median_date_num))
median_date
Timestamp('2017-01-15 00:00:00')
max(dates)
Timestamp('2017-05-05 00:00:00')

Analyzing the distribution of non-NaN NDVI values over time reveals regular intervals during which fewer than 600,000 (of a possible 128 x 128 x 100 = 1,638,400 - so less than 40%) pixels show valid values for some measurement periods (every 5 days). This observation highlights the poor data quality and suggests that the prediction performance of our time series models will be significantly negatively affected by this substantial amount of missing data.

However, we have chosen to start our analysis from July 2017, as the first period of missing data (< 600k valid pixel values) ends here, and the next three measurement dates contain more than 1,000,000 valid pixels. As depicted by the red LOESS trend line in the graph, the number of valid pixels reaches a higher and more consistent level starting from this point, except for some subsequent periods with a low number of valid pixel values, particularly at the end of 2018 and the beginning of 2019.

Thus, we set our training period from July 4, 2017, to June 28, 2021, and the test period from July 3, 2021, to October 6, 2022. This ensures we have three complete seasonal cycles (apart from the periods with few data) available for training our models.

import matplotlib.dates as mdates
import statsmodels.api as sm

# Extract data for plotting
dates = list(results.keys())
non_nan_counts = list(results.values())

# Convert dates to a readable format if they are in string format
if isinstance(dates[0], str):
    dates = [np.datetime64(date) for date in dates]

# Convert dates to a numeric format for LOESS
numeric_dates = mdates.date2num(dates)

# Fit LOESS model
loess = sm.nonparametric.lowess(non_nan_counts, numeric_dates, frac=0.1)

# Plot the results as a histogram and a LOESS smoothed line
plt.figure(figsize=(10, 6))
plt.bar(dates, non_nan_counts, alpha=0.6, label='Non-NaN NDVI values')
plt.plot(dates, loess[:, 1], color='red', linewidth=2, linestyle='--', alpha=0.7, label='LOESS trend')

plt.xlabel('Time')
plt.ylabel('Number of non-NaN NDVI values')
plt.title('Number of non-NaN NDVI values over time')
plt.xticks(rotation=45)
plt.grid(True)

# Set x-axis to display quarterly ticks
ax = plt.gca()
ax.xaxis.set_major_locator(mdates.MonthLocator(interval=3))  # Set major ticks to every 3 months (quarterly)
ax.xaxis.set_major_formatter(mdates.DateFormatter('%m %Y'))  # Format the ticks to show the month and year

# Set y-axis lower bound to 500,000
plt.ylim(bottom=500000)

plt.xticks(rotation=45)  # Rotate date labels for better readability
plt.tight_layout()
plt.legend()
plt.show()

2.5.2.2 5.2 Perform Train and Test Split

In this step, we will split the time series data into separate train and test datasets for each cube.

# Define Output directories
output_folder_train = base_dir / "data" / "data_train"
output_folder_test = base_dir / "data" / "data_test"
# Perform Train-Test Split and Save in Separate Files
for idx, nc_file in enumerate(nc_files, start=1):
    full_path = os.path.join(file_path, nc_file)
    
    # Load the NetCDF file
    dataset = xr.open_dataset(full_path)
    
    # Train-Test Split
    train_data = dataset.sel(time=slice('2017-07-04', '2021-06-28'))
    test_data = dataset.sel(time=slice('2021-07-03', '2022-10-06'))
    
    # Generate the new filenames and full paths for the output files
    base_filename = os.path.splitext(os.path.basename(nc_file))[0]  # Remove the .nc extension
    train_filename = os.path.join(output_folder_train, f'{base_filename}_train.nc')
    test_filename = os.path.join(output_folder_test, f'{base_filename}_test.nc')

    # Save Train and Test Data
    train_data.to_netcdf(train_filename)
    test_data.to_netcdf(test_filename)
 
    print(f"[{idx}/{len(nc_files)}] Train and test datasets saved for {os.path.basename(nc_file)}")

2.5.3 6. Prepare Data for Final Training

Given the significant number of NaN values in our dataset and time series, we have decided to implement two distinct approaches to effectively handle these missing values.

Approach A: Handling NaNs with Outliers and Cloud Mask Integration

  1. Replace NaNs with an Outlier:
    • We will replace NaN values with a clearly identifiable outlier value (e.g., -9999.0). This allows the model to recognize and handle these outliers during training.
  2. Use Cloud Mask as an Exogenous Feature:
    • In addition, we will incorporate the cloud mask as an exogenous feature in our model. This helps the model learn the relationship between cloud coverage (as indicated by the cloud mask for a specific pixel during a specific time period) and the outlier value (-9999.0). This way, the model can understand that the -9999.0 value is likely due to cloud presence, improving its ability to make accurate predictions despite these outliers.

Approach B: Interpolation with STL Decomposition

  1. STL Interpolation:

    • In this approach, we will use Seasonal-Trend decomposition using Loess (STL) to interpolate NaN values. STL decomposes the time series into seasonal, trend, and residual components, allowing for a more nuanced interpolation of missing values.
  2. Why Use STL Interpolation:

    • Captures Seasonal and Trend Components:
      • STL is particularly effective for time series data with strong seasonal and trend components, which is the case for our NDVI data. By decomposing the data, STL can accurately interpolate missing values by considering the underlying seasonal patterns and long-term trends.
    • Robust to Outliers:
      • STL is robust to outliers and can provide a more reliable interpolation compared to simpler methods, which might be biased by irregularities in the data.
    • Flexible and Adaptable:
      • STL is flexible and can handle a wide range of time series characteristics, making it a versatile choice for our dataset with its complex patterns and missing values.

2.5.3.1 6.1 Prepare data for Approach A:

# Define input and output dirs
file_path = base_dir / "data" / "data_train"
output_path = base_dir / "data" / "data_A_9999"
nc_files = [os.path.join(file_path, f) for f in os.listdir(file_path) if f.endswith('.nc')]

# Iterate over each .nc file and replace NaNs in the NDVI variable with -9999
for idx, nc_file in enumerate(nc_files, start=1):
    # Load the NetCDF file
    dataset = xr.open_dataset(nc_file)
    
    # Replace NaNs in the NDVI variable with -9999
    dataset['NDVI'] = dataset['NDVI'].fillna(-9999)
    
    # Generate the new filename and the full path for the output file
    output_file = os.path.join(output_path, f'ds_A_{os.path.basename(nc_file)}')
    
    # Save the modified dataset
    dataset.to_netcdf(output_file)
    
    print(f"[{idx}/{len(nc_files)}] Cube: {os.path.basename(nc_file)} | NaNs replaced and saved")

2.5.3.2 6.2 Prepare data for Approach B:

STL Interpolation

# Paths to the directory containing the NetCDF files
file_path = base_dir / "data" / "data_train"
output_path = base_dir / "data" / "data_B_interpolated"
os.makedirs(output_path, exist_ok=True)
# List all NetCDF files in the input directory
nc_files = [f for f in os.listdir(file_path) if f.endswith('.nc')]

# Iterate over each file
for nc_file in nc_files:
    full_path = os.path.join(file_path, nc_file)
    ds = xr.open_dataset(full_path)
    
    interpolated_data = []
    
    # Iterate over each pixel and apply STL interpolation
    for x in ds.x:
        for y in ds.y:
            ndvi_pixel = ds['NDVI'].sel(x=x, y=y)
            if ndvi_pixel.isnull().all():
                # print(f"Pixel (x={x}, y={y}) contains only NaNs and will be skipped.")
                interpolated_data.append(np.full(ndvi_pixel.shape, np.nan))  # Append NaN array
            else:
                interpolated_ndvi = stl_interpolate(ndvi_pixel)
                interpolated_data.append(interpolated_ndvi.values)
    
    # Reshape the interpolated data to match the original dimensions
    interpolated_ndvi_array = np.array(interpolated_data).reshape((ds.sizes['x'], ds.sizes['y'], ds.sizes['time'])).transpose(2, 0, 1)
    
    # Create a new DataArray with the interpolated NDVI values
    interpolated_da = xr.DataArray(interpolated_ndvi_array, coords=[ds.time, ds.x, ds.y], dims=['time', 'x', 'y'], name='NDVI')
    
    # Create a new Dataset with the interpolated NDVI values and original attributes
    new_ds = xr.Dataset({'NDVI': interpolated_da, 'Cloud_Mask': ds['Cloud_Mask']}, coords=ds.coords, attrs=ds.attrs)
    
    # Save the new dataset to a NetCDF file
    output_file = os.path.join(output_path,  f'ds_B_{os.path.basename(nc_file)}')
    new_ds.to_netcdf(output_file)
    
    print(f"Interpolated file saved: {output_file}")

2.6 Challenges in Data Preparation

The data preparation process was extremely time-consuming and challenging. This notebook is the final result of various approaches, ideas, and steps. A significant portion of our work involved experimenting with different strategies which do not appear in this notebook as they did not work out. We encountered several key challenges:

  1. Data Volume: Initially, we started with approximately 5000 cubes. However, executing some steps, such as the initial NDVI calculation based on Sentinel-2 data, was infeasible due to limited computing power, leading to frequent process failures. Consequently, we reduced the dataset to 100 cubes, selecting those with the fewest missing values. Despite this reduction, many steps, like interpolating missing values, were time-intensive and often failed due to memory constraints, prolonging our work.

  2. Data Format: None of us had prior experience with DataCubes, requiring time to understand the data format and how it was stored and loaded (see src/data_processing/my_loader.py). The lack of proper documentation for the DataLoader and MiniCubes meant we had to rely heavily on trial and error to correctly load the data in the desired format.

  3. Data Quality and Missing Data: The dataset contained many missing values, attributed to Sentinel-2 data periods with missing band values and pixels masked by the cloud mask. After deliberation, we decided to exclude the entire year 2016 due to its extensive missing values, despite metadata indicating the measurement period started on January 1, 2016. Further investigation revealed periodic intervals with few valid pixel values, detrimental to time series prediction and definetly will negatively impact our models’ predictive performance. To address missing values, we tested two approaches:

    • Setting Missing Values to an Outlier: Assigning missing values an extreme outlier value of -9999.0 (valid NDVI range: -1.0 to 1.0) and including the cloud mask as an exogenous factor to help the model learn the relationship between cloud presence and outliers. However the presence of clouds is not the only cause for missing values. They also occur without cloud indications.
    • Interpolation of Missing Values: Using STL decomposition for interpolation, specifically suitable for our data. This method was computationally and time-intensive. To avoid backward fill interpolation and accurately interpolate missing values, we restricted the time period to start from a point with relatively many valid pixels (July 4, 2017). Additionally, we decided to interpolate missing values only in the training data and evaluate model performance using actual test data (predicted vs. true test data), ignoring missing values in the test data to avoid distorting the evaluation metrics further.

2.7 Challenges in Modelling

During the process of building the models, we encountered several problems.

  1. Data for approach A

    Procedure for the LSTM model:

    After the model was created and the predictions were made, the data was denormalized. When looking at the results, we noticed that all prediction results were disproportionately high (> 0.99). Due to the natural range of the NDVI from -1 to 1 and studying the baseline data, we concluded that the results are illogical and must be incorrect. Unfortunately, due to the limited time and the long processing times of the code, we were ultimately unable to determine the exact cause and find an appropriate solution. It is possible that the masking of the -9999 values did not work correctly, so that the -9999 values were regarded as “normal” values by the model. Incorrect normalization could also have been a possible reason.

    Due to the comparability, we then decided to completely exclude this data set for all models from our project.

  2. Reducing Cubes: complete cubes Furthermore, our initial aim was to use all 100 cubes for our project. This quickly proved to be impossible with the memory and computing power available to us. Despite the use of GPU, only very few cubes could be processed. We therefore had to reduce our data basis to 4 cubes. We sorted the 100 cubes in descending order of completeness. Complete means that the cube contains as few pixels as possible that contain NaN values across all timesteps. The 4 best cubes are therefore the following:

    1. Cube 665
    2. Cube 80
    3. Cube 1203
    4. Cube 1301 Beim Model Random Forest konnten wir letztendlich nur einen Cube nutzen, aufgrund der extrem langen Precessing time des models. Weitere Informationen sind im